Clustering Text Data Streams - A Tree based Approach with Ternary Function and Ternary Feature Vector
نویسندگان
چکیده
Data is the primary concern in data mining. Data Stream Mining is gaining a lot of practical significance with the huge online data generated from Sensors, Internet Relay Chats, Twitter, Facebook, Online Bank or ATM Transactions. The primary constraint in finding the frequent patterns in data streams is to perform only one time scan of the data with limited memory and requires less processing time. The concept of dynamically changing data is becoming a key challenge, what we call as data streams. In our present work, the algorithm is based on finding frequent patterns in the data streams using a tree based approach and to continuously cluster the text data streams being generated using a new ternary similarity measure defined. © 2014 The Authors. Published by Elsevier B.V. Selection and/or peer-review under responsibility of the organizers of ITQM 2014
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملDictionary-Based Fast Transform for Text Compression with High Compression Ratio
In this paper we introduce a dictionary-based fast lossless text transform algorithm. This algorithm utilizes ternary search tree to expedite transform encoding operation. Based on an efficient dictionary mapping model, this algorithm use a fast hash function to achieve a lightening speed in the transform decoding phrase. Results shows that the average compression time using the transform algor...
متن کاملNearly higher ternary derivations in Banach ternary algebras :An alternative fixed point approach
We say a functional equation () is stable if any function g satisfying the equation () approximatelyis near to true solution of (). Using xed point methods, we investigate approximately higherternary derivations in Banach ternary algebras via the Cauchy functional equationf(1x + 2y + 3z) = 1f(x) + 2f(y) + 3f(z) :
متن کاملTernary Tree and Clustering Based Huffman Coding Algorithm
In this study, the focus was on the use of ternary tree over binary tree. Here, a new two pass Algorithm for encoding Huffman ternary tree codes was implemented. In this algorithm we tried to find out the codeword length of the symbol. Here I used the concept of Huffman encoding. Huffman encoding was a two pass problem. Here the first pass was to collect the letter frequencies. You need to use ...
متن کاملA Dictionary-Based Multi-Corpora Text Compression System
In this paper we introduce StarZip, a multi-corpora text compression system, together with its transform engine StarNT. StarNT achieves a superior compression ratio than almost all the other recent efforts based on BWT and PPM. StarNT is a dictionary-based fast lossless text transform. The main idea is to recode each English word with a representation of no more than three symbols. This transfo...
متن کامل